An Approximate String Matching Algorithm Based upon the Candidate Elimination Method
نویسندگان
چکیده
In this paper, we consider the approximate string matching problem. We give a method to eliminate candidate locations in text T as there can be no substring S starting from those locations such that the edit distance between S and pattern P is smaller than or equal to a specified error bound k . Our method is simple to implement. Experimental results show that our method is effective, especially in the case when we perform natural language searching. For instance, for a DNA type data, when the text is 10000 characters long, patterns are 40 characters long and 3 = k , in average, 73% of locations are eliminated by our method. As for a English text with 4500 characters, for English patterns with 20 characters, in average, 99.9% of locations are eliminated if 1 = k and 73.3% for 6 = k .
منابع مشابه
A New Filtration Method Based on the Locality Property for Approximate String Matching
In this paper, we consider the approximate string matching problem. We give a method to eliminate candidate locations in text T as there can be no substring ending at those locations such that the edit distance between and pattern S S P is smaller than or equal to a specified error bound . Our method is simple to implement. Experimental results show that our method is effective. For instance, f...
متن کاملAn Algorithm for Color Matching of Textiles With Elimination of Limitation on Primaries
The proposed algorithm suggests a new method for determination of K/S value of primaries based on linear least Squares Technique. By applying the matrix pseudoinverse, a modification is introduced to eliminate the limitation on the numbers of applied dyes in one – constant Kubelka-Munk theory. The selection of dyes for tristimulus matching are also done on the basis of the initial spectrophotom...
متن کاملAn Algorithm for Color Matching of Textiles With Elimination of Limitation on Primaries
The proposed algorithm suggests a new method for determination of K/S value of primaries based on linear least Squares Technique. By applying the matrix pseudoinverse, a modification is introduced to eliminate the limitation on the numbers of applied dyes in one – constant Kubelka-Munk theory. The selection of dyes for tristimulus matching are also done on the basis of the initial spectrophotom...
متن کاملAdaptive Approximate Record Matching
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...
متن کاملFast Approximate String Matching with Suffix Arrays and A* Parsing
We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008